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Portraits of complex networks 
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Abstract. - We propose a method for characterizing large complex networks by introducing a 
new matrix structure, unique for a given network, which encodes structural information; provides 
useful visualization, even for very large networks; and allows for rigorous statistical comparison 
between networks. Dynamic processes such as percolation can be visualized using animation. 



Introduction. — Large, complex stochastic networks 
are conspicuous in science and everyday life and have at- 
tracted a great deal of interest [1-3]. A difficult problem 
when studying networks is that of comparison and identifi- 
cation. Given two networks, how similar are they? Could 

■ they have arisen from the same generating mechanism? 
Given a real-world network, such as a protein-protein in- 
teraction network, or an electric power grid, say, how can 
one determine which stochastic network model most ac- 
curately captures its relevant structure? Is there a rea- 
sonable way to illustrate what a particular network looks 
like? 

A network, or graph, is characterized completely by its 
adjacency matrix — an TV x TV matrix whose nonzero en- 

■ tries denote the various links between the graph's N nodes. 

\ This representation, however, is not unique, in that it de- 
pends on the actual labeling of the nodes, and graph iso- 
morphs (identical graphs with permuted labels) cannot 
be readily distinguished from one another [4]. The same 
is true of graphical representations, where node placement 
is arbitrary (Fig. [I]). 

In this letter, we propose a new method for recognizing 
and characterizing large complex networks that is inde- 
pendent of labeling and circumvents the problem of graph 
isomorphism. For each network we compute its 5-matrix: 
a signature that represents the network reliably and serves 
as its 'portrait.' We thus have a means for recognizing net- 
works at a glance and judge their differences and similar- 
ities, for the first time, enormously increasing our under- 
standing and intuition [6]. We also introduce a "distance," 
derived from the 5-matrix, that quantifies network differ- 




(c) 



(d) 



Fig. 1: Planar embeddings and adjacency matrices for a small 
network. It is difficult to tell visually that these represent the 
same network, even at such a small size. 



ences, rendering comparisons mathematically meaningful. 
One important application is to the comparison of phylo- 
genetic trees representing various organisms [5] . 

Portraits. — A graph G consists of a finite set of 
nodes, or vertices, V = {^1,^2, vjv}j an d a set of edges, 
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or links, between pairs of vertices, E = {(i^, Vj)}. In ap- 
plications, the vertices label elements of a network, and 
edges denote relationships between elements. The num- 
ber of links, ki, connected to a vertex Vi is the degree of 
the vertex. Much recent interest has focused on scale-free 
networks, which exhibit a power-law degree distribution, 
P{k) ~ A; -7 . Despite its strong influence on various prop- 
erties, the degree distribution is but one of many charac- 
teristics. Two large networks may possess similar degree 
distributions yet differ widely in clustering (the extent to 
which neighbors of a node connect to one another) [1], as- 
sortativity (the frequency of connections between nodes of 
like degrees) [7], and other important properties. 

We now introduce the £?-matrix. Define the distance 
between two nodes as the smallest number of links con- 
necting them, found using Breadth-First Search (BFS) [8]. 
Thus, a node V{ is surrounded by ^-shells: the subsets of 
nodes at distance £ from V{. Let 

B^k = number of nodes that have exactly k (1) 
members in their respective ^-shells. 

Note that B is independent of node labeling: all isomorphs 
of a graph have exactly the same 5-matrix. Enumerating 
the shell members of a specific node requires O (TV) steps 
for a sparse graph [8], thus construction of the £?-matrix 
requires 0(N 2 ) steps. Example 5-matrices are shown in 
Figs. [2]-[8] and discussed in the Results section. 

It is easy to see, from (pQ), that the degree distribution 
of a graph is encoded in the first row of its 5-matrix, 



Bi, fc = NP(k) 



(2) 



since the degree of a node equals the number of neighbors 
in its £ = 1 shell. Generalizing this concept, we define the 
degree of order £ of a node as the number of members in 
its £-th shell. Then, row £ of the £?-matrix lists the graph's 
distribution of degrees of order £: 



Bi, k = NP t {k) . 



(3) 



Consider a maximally random network, constructed by 
the Molloy-Reed algorithm [9]. Its structure is fully de- 
termined by its (first-order) degree distribution, or by the 
first row of its 5-matrix. For example, the second row is 



B 2 , k = Y J B U ]T ( Pjl 



(4) 



I 31,32,---Jl 

J1+J2H \-3l=k + l 



where p m = mBi yTn / nB\^ n . Thus the 5-matrix con- 
tains much additional information beyond the degree dis- 
tribution, encoded in the difference between the actual 
I?2,/c and the expression (j4]) (and similarly for higher rows). 

Results. — The intuition one gains simply by looking 
at these portraits is of great value [6]. Classification and 
comparison are immediate (Figs. [5j [7j). Dimensionality 
and regularity are encoded in the overall slope and row 
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Fig. 2: (color online) |(a)| A B-Matrix with a logarithmic 
color scale (the white background indicates zero elements of 
B). The degree distribution is slightly visible in the first row. 
The "turning point" about row 4 represents finite-size effects. 
Shown is the network of the 10% most connected actors on 
IMDB [2]. |(b)| The same matrix with a logarithmic horizontal 
axis. The degree distribution is now clearly visible. 



variances (Fig. [6]), while small- world behavior is displayed 
in the "aspect ratio" (Fig. [4]). Even correlation effects are 
discernable in the fine scale structure of the higher rows 
(Fig. [8]). Properties such as assortativity were previously 
impossible to visualize for even moderately sized networks. 

Here is a list summarizing the contents and "moral of 
the story" for each panel, numbered by figure: 

2. The algorithm is cheap enough to visualize very large 
matrices, as indicated by this example and its nearly 
30000 columns. This also shows that a large amount 
of information is present in the matrix, far beyond 
the degree distribution encoded in the first row. 

3. A large random network's £>-matrix looks like the av- 
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Fig. 3: (color online) Erdos-Renyi (ER) graphs [15]. |(a)| One 
graph with N = 1000 nodes and p = 0.008. |(b)| The average 
of 100 graphs from (a) Visualizing percolation: N — 10 4 |(c)| 
below percolation, p = (l.liV) -1 ; |(d)| at percolation, p — 1/N. 



erage of an ensemble of such networks (of the same 
size) (panels a,b). A phase transition such as perco- 
lation is immediately visible (c,d). 

4. The transition to small world is visible in the changing 
'aspect ratio' of the portrait. These portraits have all 
been padded to the same dimensions. 

5. Scale-free networks with identical numbers of nodes 
and power law exponents can still give radically dif- 
ferent portraits. Thus the portrait can be used to 
infer a generating mechanism or scale-free model, by 
providing information beyond the degree distribution. 

6. Lattice defects, dimensionality (since shells scale like 
the dimension - 1), and "regularity" are all visible in 
the portrait. This is useful, since the change in edges 
between a periodic and non-periodic lattice is small, 
though very specific, and this leads to massive change 
in the corresponding portraits. 

7. Real world networks can give remarkably different 
portraits, but some classes of real- world networks can 
look similar (shown here with four metabolic net- 
works in panels c-f). The four metabolic networks 
look quite similar despite widely varying scales in 
both axes. This suggests a simple scaling procedure; 
stretch one or both axes until the portraits overlap. 

8. Correlation effects may still be visible in the higher 
rows of the portrait. Here is a highly disassortative 
metabolic network, note the vertical structures in the 
higher rows. Rewiring or perturbing this network to 
raise the assortativity destroys these structures. 
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Fig. 4: (color online) The emergence of small world. Shown are 
Newman- Watts- St rogatz graphs [16] with N = 1000; k = 4; 
and p = 1/20, 1/10, 1/5, and 2/5; f(aWd)] respectively. 



Comparing portraits. — The portraits are useful for 
showing an intuitive picture of a network, but they can 
also be used quantitatively. A simple "distance" compar- 
ing networks G and G' may be defined, using their respec- 
tive £?-Matrices0. Motivated by the Kolmogorov-Smirnov 
(KS) test [10], we introduce the following statistic between 
corresponding pairs of rows and B'f 



K e = max \C £ k -C £ k , (5) 

k 

where C is the matrix of cumulative distributions of B, 



Cpk 



(6) 



,k'<k 



The greater impact of lower shells on network properties 
(such as the average path length [11,12]) can be consid- 
ered by assigning weights a£, based on shell "mass," for 
instance: 

k k 

Finally, we choose a scalar distance A, generated by 



A(G,G") = A(B,B') 




Qi£. 



(8) 



See Fig. [9] for some concrete examples. 

We apply this distance metric to four networks, sum- 
marized in Fig. 9. Two Erdos-Renyi (ER) networks, 



^■We assume that the networks are of comparable size. Empir- 
ically, the ^-matrices may be scaled and normalized: {£, k, B} i— ► 
{£/L,k/K, B/N}, where L and K are the largest shell number and 
largest degree (of any order), respectively. When the number of rows 
is small, one may first replace the £>-matrix by a suitably smoothed 
surface (applying a spline procedure), then proceed with scaling. 
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Fig. 5: (color online) Scale- Free models. The average of 100 
instances of the (undirected) Krapivsky-Redner (r = 1/2) [17]; 
Barabasi- Albert (BA) (m = 2) [18]; and Molloy-Reed (MR) 
(drawn from P(k) ~ A: -3 ) [9] networks; as well as the (1,3)- 
Flower at generation 6 [19]; |(a)f|(d)| respectively. All have 
N — 2732, 7^3, but (k) varies. Note that (d) has been 
darkened slightly for clarity. 



with equal N and p, and a Barabasi- Albert (BA) versus a 
Molloy-Reed (MR) network built from the BA degree dis- 
tribution. The plot indicates the value of the test statistic, 
Eq.[5j while the table indicates the values of A, from Eq.[8l 
The plot shows that the two ER networks agree very well 
with each other, while the BA and MR networks agree 
at first, but differences appear in higher rows (since BA 
has correlation effects missing in MR). The table values 
all agree with expectations: the ER graphs are very close 
to each other, the BA and MR graphs are farther apart 
from each other, and both BA and MR are very far from 
the ER networks. 

Mathematically, it remains an open question if A is a 
metric or semi- metric (pseudometric). It is obvious from 
Eqs. E] and that A(x,y) > and A(x,y) = A(y,x) . 
Furthermore, the numbers in Fig. [9] satisfy the triangle 
inequality, but does this hold generally? The final issue 
at hand concerns indiscernibility, A(x,y) = <^=^> x = 
y. Discernibility in A(B,B f ) appears to hold, but there 
exist two non-isomorphic graphs, the dodecahedral and 
Desargues graphs, which have identical £Ts, disproving 
discernibility in A(G, G'), if only because their 5-matrices 
are indiscernible 0. 

Conclusions and future work. — To summarize, B- 
matrices offer us an unambiguous way to visualize and dis- 
criminate between various complex networks. With little 
practice one can readily pick the patterns that distinguish 
one case from another: for example, the metabolic net- 
works (Fig. O have a distinctly similar appearance, with 
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Fig. 6: (color online) Regular 40 x 40 lattices with defects, [(a 
A periodic and (b) non-periodic lattice; (c) a lattice with skew- 
periodic boundaries; and (d) a periodic lattice with a random 



5 percent of all nodes missing. Observe the strong linear slope, 
indicating the underlying two-dimensional lattice, as well as 



the narrowness of the distributions in (a) (c) and |(d)| due 
to the regularity of the periodic lattice. Similarly, ID lattices 
show a constant (vertical) line and 3D lattices exhibit quadratic 
growth. 



a prominent "knot" near the center of the portraits. Even 
small changes in structure induce visible changes in the 
5-matrix (Figs. [3b, d, and [Hi); the largest changes being 
induced by the removal or addition of links of highest be- 
tweenness centrality [13]. 

We have also introduced a distance, associated with the 
5-matrix, that quantifies the differences between complex 
networks. The distance between networks belonging to the 
same ensemble is small (Figs. [3K,b, and [9]), but it grows 
larger for networks in different ensembles (Fig. [9]). 

Several generalizations come to mind. Eq. (pQ) encom- 
passes directed graphs and may be extended to weighted 
graphs: shells are defined by a set of weights W = 
{wi,W2, . . . iWd} and could be found by Dijkstra's algo- 
rithm [14]. One may also generalize B to edges by defin- 
ing the distance from a node V{ to an edge (vj,Vk) as the 
mean of distances d(vi,Vj) and d(vi,Vk) El This "edges 
matrix" has half-integer rows with row 1/2 encoding the 
degree distribution, Bi/2,k = NP{k), and so forth. 

Among the most promising applications of -matrices, 
besides identification and comparisons, is the question of 
the information content of complex graphs. The portraits 
can be compressed by applying conventional algorithms. 
The size of the compressed files could serve as a measure of 
information content (the difference in entropy of stochastic 
scale- free networks, vs. that of the highly ordered flower, 
in Fig. O is apparent even visually). 



2 These graphs are both discussed in the final section. 



3 .Bg & is now the # of nodes with k edges at distance t. 
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Fig. 7: (color online) Several real world networks, (a) The 
western states power grid (unweighted) [1]; |(b)| US airlines net- 
work [20]; and directed metabolic networks for H. influenzae, 
R. capsulatus, M. jannaschii, and C. elegans [3], |(c) - (f) re- 



spectively. The metabolic networks appear similar to one an- 
other yet unlike the power grid and airlines networks. 



Another interesting problem would be to use the 
"smoothness" of the matrices to create some quantitative 
measure of regularity, perhaps based on the variances of 
each row. This could also provide a useful measure of in- 
formation content as well as symmetry and perhaps other 
characteristics. In several instances, we have discussed the 
"slope" of the matrix without giving specifics. While it's 
easy to identify dimensionality from the lattices of Fig. [6l 
other networks are of higher dimension with broader row 
distributions and it's more difficult to pick out the slope 
visually. A specific fitting procedure or other technique 
may be useful. 

Given the degree probability distribution (the first row 
of the £?-matrix) there exist algorithms to construct com- 
plex networks that satisfy that degree distribution [9]. 
Perhaps the most important open question is the inverse 
of obtaining the B-matrices: Given a 5-matrix, find a pro- 
cedure to construct random complex nets belonging to the 
ensemble represented by it. This is related to the question 
of satisability constructing a random net that satises just 
the P\(k) degree distribution is already non-trivial [21], 
and this complicates as higher-order Pj's are added in. 
There exist already examples of procedures for obtaining 
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Fig. 8: (color online) (a) The original metabolic network of 
M. genitalium [3] with assort at ivity A = —0.174216 and (b) 
with A = 0.000757 after permuting random edge pairs while 
preserving the degree distribution. The fine-scale structure in 
the upper- most shells of |(a)| is no longer present in |(b)| 



maximally random nets with more than the Pi(k) con- 
straint, for example in [22] it is shown how to satisfy both 
the degree distribution and arbitrary degree-degree corre- 
lations. 

Regarding the famous Graph Isomorphism problem, 
consider the non-isomorphic dodecahedral and Desargues 
graphs; both are cubic distance-regular with 20 nodes [23] 
and both have identical B-matrices 0, so B does not 
uniquely encode a network. In practice, the probability 
of two large, non-isomorphic graphs chosen from a large 
ensemble having identical £?-matrices appears to be van- 
ishingly small, since the slightest difference will propagate 
throughout B. The dodecahedral and Desargues graphs 
are very similar in appearance, and the specific relation- 
ship between their edge sets that allows for identical 5's 
is unlikely to arise at random. We propose that B is a 
"very good" answer to graph isomorphism. It is also worth 
noting that the Desargues and dodecahedral graphs have 
different edge matrices: we conjecture that graphs are 
uniquely identified with both matrices. The true power 
of B as a measure of graph isomorphism remains an open 
question and warrants further study. 

Finally, it is worth noting that the construction of B 
requires an 0(N 2 ) algorithm, which may preclude its use 
for extremely large networks. However, this algorithm is 
easily parallelized by spreading the starting nodes over 
multiple machines. 



We thank H. Rozenfeld for discussions. We thank 
the NSF for support from a National Science Founda- 
tion Graduate Research Fellowship (JPB) and awards 
no. DMS-0404778 (EB) and PHY0555312 (DbA). 



4 Distance-regular graphs will have exactly one nonzero element 
per row in B; in principle, this may be exploited to search for undis- 
covered distance-regular graphs by rewiring edges along some scheme 
to minimize the number of nonzero elements per row. This would 
likely be cost-prohibitive in practice. 
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Fig. 9: (top) Row- wise statistic Kf. two ER graphs with 
N = 10 4 and p = 0.002; and a BA (diameter 10) versus an MR 
network (P(k) ~ k~ 3 , diameter 14), both with N = 5 x 10 4 . 
Both the BA and MR networks have the same degree distribu- 
tion, so the first rows agree. Differences in, e.g., assortativity, 
soon become evident, (bottom) Table containing the values of 
A, given by Eq. JS}, for the four networks. This table shows 
that the two ER graphs are very close to each, while the MR 
and BA graphs are somewhat far apart from each other and 
very far from the ER graphs, as expected. 



REFERENCES 

[I] D. J. Watts and S. H. Strogatz, Nature 393, 440 (1998). 
[2] L. A. N. Amaral, A. Scala, M. Barthelemy, and H. E. Stan- 
ley, Proc. Natl. Acad. Sci. USA 97, 11149 (2000). 

[3] H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A.-L. 
Barabasi, Nature 407, 651 (2000). 

[4] B. McKay, Congr. Numer. 30, 45 (1981). 

[5] L. J. Billera, S. P. Holmes, and K. Vogtmann, Adv. in Appl. 
Math. 27, 733 (2001), and references therein. 

[6] Many more B- matrices, and animations, are currently 
available at |http : / / people . clarkson . edu7~~ qdOO/ 

[7] M. E. J. Newman, Phys. Rev. Lett. 89, 208701 (2002). 

[8] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Intro- 
duction to Algorithms (The MIT Press, Cambridge, Mass., 
1990). 

[9] M. Molloy and B. Reed, Comb. Probab. Comput. 7, 295 
(1998). 

[10] W. J. Conover, Practical Nonparametric Statistics (Wiley, 
1998). 

[II] M. E. J. Newman, S. H. Strogatz, and D. J. Watts, Phys. 
Rev. E 64, 026118 (2001). 

[12] S. N. Dorogovtsev, J. F. F. Mendes, and A. N. Samukhin, 

Nucl. Phys. B 653, 307 (2003). 
[13] M. Girvan and M. E. J. Newman, Proc. Natl. Acad. Sci. 

USA 99, 7821 (2002). 
[14] E. W. Dijkstra, Numer. Math. 1, 269 (1959). 
[15] P. Erdos and A. Renyi, Publ. Math. 6, 290 (1959). 
[16] M. E. J. Newman and D. J. Watts, Phys. Rev. E 60, 7332 



(1999). 

[17] P. L. Krapivsky and S. Redner, Phys. Rev. E 63, 066123 
(2001). 

[18] A.-L. Barabasi and R. Albert, Science 286, 509 (1999). 

[19] H. D. Rozenfeld, S. Havlin, and D. ben-Avraham, New J. 
Phys. 9, 175 (2007). 

[20] V. Batagelj and A. Mrvar, Pajek datasets (2006), URL 
|http : //vlado . f mf . uni-1 j . s i/pub/networks/data/ , 

[21] M. Molloy and B. Reed, Random Structures and Algo- 
rithms. 6, 161179 (1995). 

[22] S. Weber and M. Porto, Phys. Rev. E 76, 046111 (2007). 

[23] A. E. Brouwer, A. M. Cohen, and A. Neumaier, Distance- 
Regular Graphs ( Springer- Ver lag, New York, 1989). 



p-6 



